Implementing K - means Algorithm using Row store and Column store databases : A case study

نویسنده

  • Rajappa Velur
چکیده

K-means Clustering is an important algorithm for identifying the structure in data. K-means is the simplest clustering algorithm [8]. This algorithm uses as input a predefined number of clusters i.e., the K from its name. Mean stands for an average, an average location of all the members of a particular cluster. In this work, a novel approach to seeding the clusters with a latent data structure is proposed. This is expected to minimize the need for number of clusters apriory, Time for convergence by providing near optimal cluster centers. Also these algorithms are tested on the latest standards for data warehouse – the column store databases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Column Oriented Database: Implementation and Performance Analysis

The volume of data in an organization is growing rapidly. So does the number of users who need to access and analyze this data. IT systems are used more and more intensive, in order to answer more numerous and complex demands needed to make critical business decisions. Data analysis and business reporting need more and more resources. Therefore, better, faster and more effective alternatives ha...

متن کامل

Column-store support for RDF data management: not all swans are white

This paper reports on the results of an independent evaluation of the techniques presented in the VLDB 2007 paper “Scalable Semantic Web Data Management Using Vertical Partitioning”, authored by D. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach [1]. We revisit the proposed benchmark and examine both the data and query space coverage. The benchmark is extended to cover a larger portion of the...

متن کامل

A Storage Advisor for Hybrid-Store Databases

With the SAP HANA database, SAP offers a high-performance in-memory hybrid-store database. Hybrid-store databases—that is, databases supporting rowand column-oriented data management—are getting more and more prominent. While the columnar management offers high-performance capabilities for analyzing large quantities of data, the row-oriented store can handle transactional point queries as well ...

متن کامل

Role of Store Image and Service Quality on Imaging Goods with Private Label and Its Influence on Promoting Purchase Intention: A Case Study of Hyperstar Customers

Retailers’ brands maker with private label have significantly boosted market share in recent years. Creating new brands for goods or services provide differentiation with similar distributors. The main aim of this paper is to test which component can be more effective in consumers’ purchase intention based on using private label for goods’ image. This research data was collected by prior st...

متن کامل

A Comparison of C-Store and Row-Store in a Common Framework

Recently, a “column store” system called CStore has shown significant performance benefits by utilizing storage optimizations for a read-mostly query workload. The authors of the C-Store paper compared their optimized column store to a commercial row store RDBMS that is optimized for a mixture of reads and writes, which obscures the relative benefits of row and column stores. In this paper, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009